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DETAILED ACTION 

1 . This communication is in response to the Argument and Amendments filed on 
09/26/2008. Claims 55-69 remain pending and have been examined, with claims 1-54 
being cancelled. The Applicants' amendment and remarks have been carefully 
considered but they do not place the application in condition for allowance. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 



Response to Amendments and Arguments 

3. Applicant's arguments (pages 7-8) filed on 09/26/2008 with regard to claims 55- 
69 have been fully considered and are moot in view of new grounds for rejection. 



Claim Objections 

4. Claims 60-64 are objected to because of the following informalities: The 
limitation of "operable to cause data" should be changed to "causing a" since the former 
does not yield a positive recitation. Appropriate correction is required. 



Claim Rejections - 35 USC §112 

5. The following is a quotation of the first paragraph of 35 U.S. C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 
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6. Claims 60-64 are rejected under 35 U.S.C. 112, first paragraph, as failing to 
comply with the written description requirement. The claim(s) contains subject matter 
which was not described in the specification in such a way as to reasonably convey to 
one skilled in the relevant art that the inventor(s), at the time the application was filed, 
had possession of the claimed invention. The claimed limitation of "computer program 
product" is not defined in the Specification. The term computer program is the only 
limitation found in the Specification. 

7. The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

8. Claims 60-64 are rejected under 35 U.S.C. 112, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. It is unclear as to what the Applicant is seeking to 
claim with the claim limitation of a "computer program product". It is unclear as to 
whether the computer program product contains the computer readable medium or just 
program code, if the former, then how is the computer readable medium encoded on the 
computer readable medium. Hence, for the purposes of compact prosecution it was 
interpreted to mean the computer program. 

Claim Rejections - 35 USC § 103 

9. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically teach or described as set forth 
in section 102 of this title, if the differences between the subject matter sought to be patented and the 
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prior art are such that the subject matter as a whole would have been obvious at the time the invention 
was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

10. Claims 55, 59, 60, 64, 65, and 69 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Su et al. (In Proceedings of the 32nd Annual Meeting on Association 
For Computational Linguistics 1994) in view of Frantzi et al. ("Extracting Nested 
Collocations"). 

As to claims 55, 60, and 65, Su et al. teaches a computer-implemented method 
for identifying compounds in text, comprising: 

extracting a vocabulary (see page 244, 2 nd full paragraph, sect. 
Simulation, (1 st paragraph), line 5-8, compound list) of tokens(see page 244, 
Table 1 ) from text(see page 243, left column, 2 nd paragraph, line 6); 

identifying a plurality of unique n-grams in the text (see page 245, right 
column, "Simulation," 1st paragraph, compound list is modified or rebuild after a 
new compound word is detected. The compounds having plurality of lengths is 
obvious in document being studied.), each n-gram being an occurrence in the 
text of n sequential tokens, each token being found in the vocabulary (see page 
244, left column, lines 4-18, relative frequency of the n-gram is computed. It is 
obvious to one skilled in the art that the n-gram is associated with the respective 
frequency); 

dividing each n-gram into n-1 pairs of two adjacent segments, where each 
segment consists of at least one token (see page 243, right column, 2 nd full 
paragraph, Mutual Information, where words are used to determine word 
association measure.); 
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for each n-gram, calculating a likelihood of collocation for each pair of 
segments of the n-gram (page 243, right column, equation, specifically, the 
probabilities in the formula that are calculated, numerator and denominator) and 
determining a score (page 243, right column, line 8, value obtained, lambda) for 
the n-gram based on a lowest calculated likelihood of collocation (see page 243, 
right column, line 23, lambda has a lowest value of 0) ; 

identifying a set of n-grams having scores above a threshold (see page 
243, right column, line 23, lambda has a lowest value of 0, value of lambda 
greater than zero identifies a compound word)); and 

adding the identified set of n-grams as compound tokens to the 
vocabulary (see page 245, right column, 2 nd paragraph, line 7, compound list) 
and removing constituent tokens that occur in the added compound tokens from 
the vocabulary (see page 244, left column, Relative Frequency Count paragraph, 
relative frequency is used to prevent when multiple occurrences occur). 

However, Su et al. does not specifically teach the use of iterating from n > 
2 down to n = 2 where n decreases by one each iteration and in each iteration 
performing the actions. It should be noted that Su et al. does suggest using 
window sizes of two or three for n-gram determination (see page 243, left 
column, 1 st paragraph). 

Frantzi et al. does teach the use iterating from n > 2 down to n = 2 where 
n decreases by one each iteration and in each iteration performing the actions 
(page 43, right column, "The algorithm 2 nd full paragraph, code underneath 



Application/Control Number: 10/647,203 Page 6 

Art Unit: 2626 

and page 44, entire left column-right column, numbered item 5) (e.g. From the 
cited reference it is seen that the n-gram starts from some maximum limit and 
then proceeds to a lower order n-gram. The n-gram is decremented and takes 
into account the frequency of occurrence in order to determine a candidate 
collocation by the determination of a C value.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the finding of compounds words in a 
corpus as taught by Su et al. with the backward iteration as taught by Frantzi et 
al. The motivation to have combined the references involves the ability to 
systematically determine the likelihood of collocation and extract the unextracted 
collocations that occur (see Abstract) and thus making the process automatic. 

As to claims 61 and 66, it would have been obvious to one of ordinary 
skilled in the art to have implemented the method of claim 55 into a computer 
readable medium whereby a processor executes the program from the medium. 
Further, Su suggests the use of a computing system (see page 245, right 
column, Simulation) and so does Frantzi (see page 43, right column, code on left 
hand side). 



As to claims 59, 64, and 69 Su et al. in view of Frantzi et al. teaches all of the 
limitations as in claims 55, 61, and 66, above. 

Furthermore, Su teaches where identifying a plurality of unique n-grams in 
the text comprises skipping n-grams appearing in a list of known compounds 
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(see page 244, left column, Relative Frequency Count paragraph, relative 
frequency is used to prevent when multiple occurrences occur, thus skipping n- 
grams appearing in list ).. 

1 1 . Claims 56, 58, 61 , 62, 66 and 68 are rejected under 35 U.S.C. 1 03(a) as being 

unpatentable over Su et al. in view of Frantzi et al. as applied to claims 1 , 6, 13, and 24 

above, and further in view of Manning (The MIT Press 1 999). 

As to claims 56, 61 , and 66 Su et al. in view of Frantzi et al. teaches all of the 

limitations as in claims 1, 6, 13, and 24 above. 

Furthermore, Su et al., teaches where the likelihood ratio X is computed 
by: ^=(P(x_|Mc)*P(Mc))/(P(x_|Mnc)*P(Mnc))(see Su etai, page 243, right column, 
line 9 (equation)) (e.g. It should be noted that the reference uses a different 
notation, but the same result and definitions are used, where the numerator is the 
n-gram produced by a compound result and the denominator is the result 
produced by a non-compound result. The formula can be changed to account for 
various distributions (Gaussian, Binomial). 

However, Su et al. in view of Frantzi et al. do not specifically teach the 
likelihood ratio given by X=L(Hi)/L(H c ). 

Manning shows the use of the likelihood ratio (see equation 5.10)(e.g. The 
equation in given in log form. The logs can be omitted to obtain the desired 
formula. The numerator is the independent hypothesis and the denominator is 
the dependence hypothesis.) 
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It would have been obvious to one of ordinary skilled in the art to have 
modified finding of compounds in a text corpus as taught by Su et al. and Frantzi 
et al. with the formula as taught by Manning. The motivation to modify the former 
is for collocation discovery (see Manning, page 172, sect. 5.3.4, 3 rd paragraph, 
lines 1-4). 



As to claims 58, 63, and 68, Su et al. in view of Frantzi et al. teaches all of the 
limitations as claim 13, 16, and 17 above. 

Su et al. in view of Frantzi teach a system for identifying compounds 
through measure of association. 

However, Su et al. in view of Frantzi do not specifically teach the 
representation of the independence and collocation hypothesis. 

Manning does teach the explanations of these two types of hypothesis 
(see page 172, sect. 5.3.4, bullet items) (e.g. It should be noted that the 
independence hypothesis is given by hypothesis 1 and the dependence or 
collocation hypothesis by hypothesis 2. The w 2 and wi can be interpreted as the 
tokens since the reference deals with a text corpus). 

It would have been obvious to one of ordinary skilled in the art to have 
modified the finding of compound words in a text corpus as taught by Su et al. 
and Frantzi et al with the inclusion of the two hypothesis as taught by Manning. 
The motivation to modify the former is for collocation discovery (see Manning, 
page 172, sect. 5.3.4, 3 rd paragraph, lines 1-4). Further, the use of the formula 
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presented by Manning would require an explanation of frequency for each type of 
hypothesis in order to find the likelihood ratio (definition of likelihood ratio). 

Allowable Subject Matter 

12. Claims 57, 62, and 67 are objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

1 3. The following is a statement of reasons for the indication of allowable subject 
matter: none of the prior art references alone or in combination teaches or fairly 
suggests the limitations where the limitations "L(H C ) is computed ... in accordance with 

the formula: argmax — ' t 2f ormcom P ouncl ) » as seen j n c | a j ms 19 anc | 30. 

l(h,) L(n - gramdoesnotformcompound) 

Conclusion 

14. Applicant's amendment necessitated the new ground(s) of rejection presented in this 
Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). 
Applicant is reminded of the extension of time policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
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the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

15. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to PARAS SHAH whose telephone number is (571)270- 
1650. The examiner can normally be reached on MON.-THURS. 7:00a. m.-4:00p.m. 
EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571)272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

IP. S.I 

Examiner, Art Unit 2626 
11/06/2008 
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